Psychometric properties of an innovative smartphone application to investigate the daily impact of hypoglycemia in people with type 1 or type 2 diabetes: The Hypo-METRICS app

Introduction The aim of this study was to determine the acceptability and psychometric properties of the Hypo-METRICS (Hypoglycemia MEasurement, ThResholds and ImpaCtS) application (app): a novel tool designed to assess the direct impact of symptomatic and asymptomatic hypoglycemia on daily functioning in people with insulin-treated diabetes. Materials and methods 100 adults with type 1 diabetes mellitus (T1DM, n = 64) or insulin-treated type 2 diabetes mellitus (T2DM, n = 36) completed three daily ‘check-ins’ (morning, afternoon and evening) via the Hypo-METRICs app across 10 weeks, to respond to 29 unique questions about their subjective daily functioning. Questions addressed sleep quality, energy level, mood, affect, cognitive functioning, fear of hypoglycemia and hyperglycemia, social functioning, and work/productivity. Completion rates, structural validity, internal consistency, and test-retest reliability were explored. App responses were correlated with validated person-reported outcome measures to investigate convergent (rs>±0.3) and divergent (rs<±0.3) validity. Results Participants’ mean±SD age was 54±16 years, diabetes duration was 23±13 years, and most recent HbA1c was 56.6±9.8 mmol/mol. Participants submitted mean±SD 191±16 out of 210 possible ‘check-ins’ (91%). Structural validity was confirmed with multi-level confirmatory factor analysis showing good model fit on the adjusted model (Comparative Fit Index >0.95, Root-Mean-Square Error of Approximation <0.06, Standardized Root-Mean-square Residual<0.08). Scales had satisfactory internal consistency (all ω≥0.5), and high test-retest reliability (rs≥0.7). Convergent and divergent validity were demonstrated for most scales. Conclusion High completion rates and satisfactory psychometric properties demonstrated that the Hypo-METRICS app is acceptable to adults with T1DM and T2DM, and a reliable and valid tool to explore the daily impact of hypoglycemia.


Introduction
Hypoglycemia remains a daily threat for most people with type 1 diabetes (T1DM) or insulintreated type 2 diabetes (T2DM). Hypoglycemia impacts on many areas of daily life including sleep duration and quality, mood, cognition, and productivity, and can in extreme cases lead to coma or even death [1][2][3][4][5]. Both the experience of hypoglycemia, and living with the risk of hypoglycemia (including prevention and treatment) can present a significant burden for people with diabetes [6].
Previous studies on the impact of hypoglycemia have important limitations. These include recall bias for retrospective measures [7], low ecological validity for hospital-based studies [8], limited insight into the impact of asymptomatic episodes detected by continuous glucose monitoring (CGM) [9], and scarce investigation of impact beyond health status and fear of hypoglycemia [10,11]. The Hypo-METRICS (Hypoglycemia MEasurement, ThResholds and ImpaCtS) study attempts to address these limitations and further our understanding of hypoglycemia in its different forms, i.e. symptomatic and asymptomatic, severe and self-treated, and while awake and during sleep [12]. To capture data temporally closer to the hypoglycemic episodes as they occur in real-life, an ecological momentary assessment smartphone application (Hypo-METRICS app) in which experiences are repeatedly captured in real-time and in a usual environment, has been developed. The app prompts participants (morning, afternoon, and evening) to respond to questions within a few hours of any hypoglycemic episode occurring in their everyday lives.
To investigate whether broad use of the Hypo-METRICS app in research and practice is indicated, this study investigates the app's acceptability (completion rates) and psychometric characteristics (structural validity, internal consistency reliability, test-retest reliability, convergent and divergent validity). These investigations are essential instrument validation steps, that are required before we can address the Hypo-METRICS study key objectives (outlined in Divilly et al [13]) in future publications.

Study participants & procedure
Hypo-METRICS is part of the EU IMI2-funded Hypo-RESOLVE (Hypoglycemia-Redefining SOLutions for better liVEs) project [12,13], across five European countries (Austria, Denmark, France, the Netherlands, and the United Kingdom). The Hypo-METRICS clinical study has received ethical approval at the lead site from the South Central Oxford B Research Ethics Committee (20/SC/0112) and in the other European countries (Ethikkommission der Medizinischen Universität Graz (Austria), Videnskabsetisk Komite for Region Hovedstaden (Denmark), Comité De Protection Des Personnes SUD Mediterranne IV (France), and Commissie Mensgebonden Onderzoek Regio Arnhem-Nijmegen (the Netherlands)).
Eligible participants were aged 18-85 years, had experienced at least one self-reported hypoglycemic episode in the past 3 months, and were willing to complete the app three times per day and wear a blinded CGM for 10 weeks. Informed written consent was obtained from all participants. Participants from the following three groups were recruited: a) T1DM with intact awareness of hypoglycemia (Gold Score <4 [14]), b) T1DM with impaired awareness of hypoglycemia (Gold score �4 [14]), and c) T2DM managed with �1 insulin injection per day. The sample for the present study was determined a priori and consists of the first 100 participants to complete the Hypo-METRICS study. This sample size was based on guidelines suggesting an item-to-participant ratio of 1:10 [15] for evaluation of structural validity, and on the requirement to confirm the conceptual model fit before analyses could proceed for central study objectives. Before and after the 10-week period, participants completed an online survey (via Qualtrics, Provo, UT) with person-reported outcome measures (PROMs) and demographic and clinical characteristics were collected.

Study measures
The Hypo-METRICS app. The app consists of seven modules with 29 unique questions (see S1 Table), most of which use interval response scales (range 0-10, e.g., "not at all (0)" to "extremely (10)"). The app questions were designed to examine the impact of hypoglycemia on various domains of daily functioning, across three "check-ins": morning, afternoon, and evening (S1 Table). There was a "Skip question" option for each question. The app was developed in English (UK) and translated into Danish, Dutch, French, and German following the ISPOR guidance [16]. Further details about the app design and development are published elsewhere [17].
Additional PROMs. Validated PROMs were included for the purpose of validating the Hypo-METRICS app. Two of these were also completed via the app platform on a weekly basis, while the remaining were completed via Qualtrics (Qualtrics, Provo, UT) at baseline and end of study (see Table 1). PROMs were selected to examine constructs (e.g., 'mood' or 'cognitive functioning') either similar to those measured by the app for convergent validity, or dissimilar, for divergent validity. Moderate-to-high correlations (r s >±0.3) were expected to evidence convergent validity, and low or no correlations (r s <±0.3) were expected for divergent validity.
Demographic & clinical data. At the start of the study, age, gender, employment status, education, medical history, previous episodes of hypoglycemia, HbA1c and method of glucose monitoring, were recorded by the study site personnel and entered into an electronic database (REDCAP, Vanderbilt, USA).

Statistical analysis
Statistical analyses were conducted with R Studio [27]. Descriptive statistics were used to determine sample characteristics, completion rates, distribution of the data, and floor and ceiling effects. In case of non-normality of question responses, non-parametric tests (Spearman's rho, r s ) were applied. Between-person variance was examined using the Intraclass Correlation Coefficient (ICC) [28] and day-to-day variability in scores was examined using the Root Mean Squared Successive Difference (RMSSD) [29]. To examine structural validity, a multi-level confirmatory factor analysis (MCFA) was conducted. The following indices and values were used as indication of good global model fit: comparative fit index (CFI) >0.95, Tucker Lewis index (TLI) >0.95, the Standardized Root-Mean-square Residual (SRMR) <0.08 and Root-Mean-Square Error of Approximation (RMSEA) <0.06 [30,31]. Items were included in the MCFA if they: 1) were asked every day irrespective of whether the participant experienced a hypoglycemic episode, and 2) were not part of the work and productivity module of the app, where most items are relevant only to participants engaged in paid work. With 100 participants and a maximum of 10 unique items per check-in, a 1:10 item to participant ratio was considered acceptable for conducting factor analysis [15]. Internal consistency reliability of the scales were calculated with use of McDonald's ω with scores r s >0.7 considered to indicate satisfactory internal consistency [32]. Question responses were analyzed according to check-in time: morning, afternoon or evening. These were analyzed separately to account for potential variation in latent factors at different timepoints of the day (e.g., fear of hypoglycemia in the daytime could be different to fear of hypoglycemia in the night-time). Finally, convergent and divergent validity were investigated by correlating question and scale scores with validated PROMs and participant characteristics. For more details on statistical analyses see (S1 Text).

Construct measured
Patient reported outcome measure.

Completion rates and acceptability
Participants completed 191±16 of 210 possible check-ins, a completion rate of 91%. Slight differences in completion between the morning, afternoon and evening check-ins were observed with completion rates of 90%, 89% and 94%, respectively. When a check-in was submitted, all questions (except the work/productivity questions) were completed more than 99% of the time (Table 3). One question ("How well did you get along with other people today?"), was skipped marginally more frequently (0.9%) than other questions. Questions in the work and productivity section were frequently skipped (range: 36-47%). Inspection of histograms revealed floor effects as indicated by more than 15% (range: 15-28%) of responses on the lowest score for the four negatively phrased questions ("How anxious do you feel right now?", "How irritable do you feel right now?", "How worried are you about having a hypo later today/ while asleep?" and "How worried are you about having high blood glucose later today/while asleep?" across the three check-ins).

Structural validity and internal consistency reliability
Based on kurtosis values and histograms, data were considered non-normally distributed. The ICC ranged from 0.15 to 0.85 and all but two questions (both relating to sleep) included at least one participant with zero variability across the study period (RMSSD = 0) ( Table 3).
Exploring the data further revealed that one participant had zero variability across all questions (with the exception of the two sleep questions), but their data were still included in further analyses as removing it did not change conclusions. Of the non-work related questions, the negatively phrased questions had the least day-to-day variability. Inter-question correlations were acceptable (r s = 0.20-0.81) and multicollinearity was absent (determinant range 0.000922-0.0134 across the three check-ins). Kaiser-Meyer-Olkin values ranged from 0.76-0.93 across all questions suggesting good factorability. Applying the five-step approach by Huang [33] to the morning check-in suggested that a MCFA was appropriate (see S2 Table). The first model, based on the conceptual framework (model A in S3 Table), had good model fit on several model fit indices, however, at the between-person level, the SRMR>0.8 (morning and afternoon) indicated that model fit could be further improved. Inspection of correlation residuals (as well as modification indices) showed a large residual between the irritability and anxiety questions at both levels, suggesting that a relationship between these two was not captured in the original model. Combining the two into one scale improved (i.e., decreased) the between-SRMR model fit parameter across all three check-ins (model B in S3 Table). The new scale was labelled 'Negative affect' (and the single-question mood scale was labelled 'Overall mood'). Inspection of the internal consistencies (ω) of model B showed that ω was low for the fear of hypo-/hyperglycemia factor on the within-person level, with values ranging between 0.19-0.30. Therefore, it was decided not to combine these two questions in the same factor. This led to the final model (model C in S3 Table), which showed good model fit on several fit indices across the check-ins (CFI>0.95, RMSEA<0.06, SRMR<0.08). In model C, standardized between-person factor loadings for all questions were >0.7 (Table 4), indicating that factors explained the grouping of the questions well, while at the within-person level, the majority of loadings for the two-question 'Negative affect' scale (across the three check-ins) were <0.7. There was a similar pattern across other scales, with satisfactory internal consistency (ω>0.7) at the between-person level but slightly lower (ω>0.5) for the 'Negative affect' and 'Cognitive functioning' scales at the within-person level (Table 4).

Test-retest reliability, convergent and divergent validity
Test-retest reliability and convergent and divergent validity were explored for each scale from model C. High test-retest correlations (r = 0.76-0.94) were found across all two-questions and single-question scales (Table 5). For convergent validity, the hypothesized pattern of correlations with PROMs was largely supported (Table 5), except for 'Energy level' (morning), 'Cognitive functioning' (morning, afternoon), 'Memory' (evening), 'Fear of hyperglycemia while asleep' (evening) and 'Social functioning' (evening). On the other hand, the PROMs measuring vitality and cognitive functioning did correlate highest with the respective app scales ('Energy level' and 'Cognitive functioning') compared to all other app scales (i.e., when reading vertically down the Table 5 columns). Divergent validity was evidenced by the lowest correlations between all app scales and the 'Financial situation (DIDP)' question, 'HbA 1c ' and 'Diabetes duration' (included solely for expected low correlations). However, many of the correlations in the remaining dark grey boxes in Table 5 (indicating other correlations that were expected to be low) were above r s >±0.3 (e.g., in the morning between 'Overall mood' and 'Cognitive functioning', 'Negative affect' and 'Vitality'). The validated weekly work-productivity questionnaire (see S4 Table), showed a strong correlation (r>0.5) with 'number of hours worked', a moderate correlation (r>0.3) with 'productivity' questions, and low correlations (r<0.3) with hours Sleep-quality score (T-score PROMIS week 10) 4 49.77 ± 8.85 (n = 58) 50.49 ± 8.68 Vitality (SF-36 vitality subscale mean) 5 3.35 ± 0.83 (n = 63) 3.37 ± 0.63 Diabetes distress (PAID total) 6  Data are mean ± SD or n (%). a igher scores indicate greater perceived cognitive difficulties. 2 Higher scores indicate greater negative impact across global life dimensions (possible ranges from 1-7 and 1-100 for composite and percentage scores respectively). 3 Higher scores indicate higher fear of hypoglycemia. 4 Higher scores indicate higher sleep disturbance (lower sleep quality). 54 Higher scores indicate higher fatigue (less energetic). 6 (22) missed from work and activities other than work on the app. Fig 1 provides an overview of the overall domains of daily functioning that the Hypo-METRICS app is believed to assess based on the psychometric analyses performed.

Discussion
This study examined the acceptability and psychometric properties of an innovative smartphone app (Hypo-METRICS): results of the present study support its use as an innovative research tool to determine the impact of hypoglycemia on daily functioning among adults with T1DM or T2DM using insulin. Average completion rates were high and the percentage of skipped questions low. The Hypo-METRICS scales had satisfactory model fit (demonstrated by a MCFA and overall satisfactory ω values), high test-retest reliability and satisfactory  Given that PROMs typically require respondents to reflect over a given period of time (e.g., 'past seven days'), daily app scores were averaged over a period of time corresponding to the PROM's recall period. For example, if the PROM's recall period was the 'past seven days', then the corresponding app scores were averaged across the final seven days of the study period and correlated with the PROM score. Hypotheses were made for each row (i.e., each app question/scale). In each row, the white cell indicates the correlation that was expected to be highest and the darkest cells indicate the correlations that were expected to be lowest for that question/scale. The lighter grey shading indicates correlations that were expected to fall between the highest and the lowest categories. For example, it was hypothesized that the "energy level" scale would correlate highest with the SF-36 vitality scale, followed by PROMs of related concepts (sleep, depression, anxiety, cognitive function), followed by less conceptually related concepts (diabetes distress, fear of hypoglycemia, financial situation, HbA1c, and diabetes duration). 1 Test-retest reliability was explored by correlating (Spearman's rho) average scores on each scale from week 3 (test condition) with average scores from week 8 (re-test). Week 3 was selected to allow a two-week run-in phase for participants to familiarize themselves with the app, and week 8 was select to allow for an inter-test interval similar to other studies [34]. Correlations r>0.7 were considered suitable for demonstrating test-retest reliability. 2 First number represents correlations with the PAID total score. 3 Second number represents correlations with the PAID question "Worrying about the future and the possibility of serious complications?" 4 First number represents correlations with the PAID total score. 5 Second number represents correlations with the PAID question "Uncomfortable social situations related to your diabetes care (e.g., people telling you what to eat)?". 6 First number represents correlations with the DIDP question "Your financial situation". 7 Second number represents correlations with the DIDP question "Your relationship with your family, friends and peers?". 8 For these PROMs, there were no specified recall periods, so the average app scores from the last seven days of the study period were used. Convergent validity was supported if Spearman correlations (between app scales and PROMs) were strong (rs>±0.5) or moderate (rs>±0.3), and divergent validity was supported if they were low (rs<±0.3). 'Financial situation (DIDP question)', 'HbA1c (mmol/mol)' and 'Diabetes duration' were included for exploring divergent validity.
Colours represent hypothesized ranking of correlations Highest and at least r s > ±0.3.

Medium.
Lowest. Numbers in bold represent correlations where hypothesized highest correlations were confirmed.
All correlations are significant at a p < 0.001, unless otherwise noted: � p<0.05, ns = not significant Higher scores on all the Hypo-METRICS app scales indicate 'better' daily functioning. https://doi.org/10.1371/journal.pone.0283148.t005 convergent and divergent validity. Overall, these findings indicate that the novel Hypo-MET-RICS app is both valid and reliable for assessing the impact of hypoglycemia on daily functioning in research, with high ecological validity and low recall bias. The high completion rates suggest that the Hypo-METRICS app is an acceptable instrument for assessments of daily functioning by people with T1DM and insulin-treated T2DM, up to three times per day, seven days per week for up to 10 weeks. All three check-ins were similarly acceptable, which may be attributed to the broad/flexible timeframes and that participants could select a convenient time for app completion. The low percentage of skipped questions (for the non-work-related questions) indicates that questions were generally applicable for most participants. The non-work-related question that was skipped the most was the 'Social functioning' question, which could be explained by the context of the COVID-19 pandemic (i.e., data collection occurred during a period of pandemic restrictions on social gatherings). The high percentage of work and productivity questions skipped was expected, as participants were instructed to skip these if they did not have paid employment or if it did not concern a workday. However, the question "How many hours did you miss from activities other than work today for ANY reason" does not require the participant to have a paid job to respond to, and the high skip rate could suggest that participants found the question difficult to respond to, difficult to understand, irrelevant, or poorly explained. At a question-level, the ICC values show that most questions, in particular those focused on worries about hypoglycemia and hyperglycemia, have greater variability between than within individuals. Further, RMSSD values show that for some participants and some questions (particularly negatively-worded and work-related questions), there was no day-to-day variability in responses across the 70-day study period. This may suggest stability in the construct or in sample characteristics (e.g., low baseline depression/anxiety symptoms) and is supported by the floor effects on negatively-worded questions (e.g., the "How irritable do you feel right now?"). Alternative explanations could be that the questions were not capable of capturing variability in the construct in this group of participants, or that variability only occurs within days and not between days. The floor effects are not considered problematic, since it is not desirable, or possible, to reach lower scores than 'not at all', and most participants had variable responses to these questions over time. The low variability for some questions may also indicate 'automatic' or 'habitual' responding', wherein participants select the same responses when presented with the same questions in the same order multiple times [35]. Future studies could explore if question randomization at each check-in would produce different results. The full range on 0-10 scales were used for all app scales, suggesting that the 11-point length was appropriate, however additional work needs to explore minimal important changes on the scales [36].
The structural validity of the app scales was examined using a MCFA. Model C showed good model fit except on the TLI and Chi-square parameters. TLI values were >0.9, which has been considered an acceptable level [37]. The Chi-square test has been argued to provide an unrealistic null-hypothesis and the value is heavily influenced by sample size; therefore, it was considered less important in model selection [30]. The two adjustments made to the original model (model A) were supported by theory. The first adjustment was to move the 'Irritability' question (originally paired with 'Overall mood'), to form a two-question 'Negative affect' scale with the 'Anxiety' question. As irritability is a facet of mood [38], it was originally paired with mood. However, irritability and anxiety are closely related as they are both aspects of negative emotionality [39]. This latter pairing was better supported by the data. The second adjustment was to separate the 'Fear of hypoglycemia' and 'Fear of hyperglycemia' questions from an original two-question scale into two, single-question scores. Although these two constructs have previously been found to be significantly correlated [40], the low internal consistency suggests that these did not covary in the current dataset. Further, participants were, on average, more worried about 'highs' than 'lows', which has been observed clinically and elsewhere [24]. An alternative explanation could be that the variance for the two questions generally was too low to allow them to covary and correlate.
Internal consistency of all app scales was satisfactory (ω >0.7) at the between-person level, but not at the within-person level for 'Negative affect' (across all three check-ins) and 'Cognitive functioning' (evening check-in). Internal consistency is highly dependent on number of questions in the scale [15], and similar within-person ω-values for two-question scales have been reported in other EMA studies and found acceptable [41,42]. Low ω-values could also reflect greater question heterogeneity than in other pairs of questions, so for analysis at the within-person level only (e.g. N = 1 studies), researchers could consider analyzing single questions rather than scales [43].
EMA methods allow an exploration of the variation in outcomes from timepoint to timepoint. Expecting perfect test-retest reliability (correlations) between assessments contradicts the general assumption of the method [42]. However, if comparing aggregated data (e.g., averaged over a longer time period), representing a person's traits or general pattern of responding, one could expect more persistent scores across time [42]. This approach has been used in other EMA studies. For example, Csikszentmihalyi et al reported that mean scores on variables measuring affect from the first part of a week correlated highly (r = 0.74) with scores from the second half of the week [42]. The aggregated Hypo-METRICS app scores showed high testretest reliability, with correlations ranging from r s = 0.76 (for the one-question 'Overall mood' scale in the evening) to r s >0.9 (for the 'Fear of hypoglycemia' and 'Fear of hyperglycemia' single questions). These findings suggest reasonable consistency, across a few weeks, in the average scores on the measured constructs.
The correlations between the app scales and validated PROMs overall showed satisfactory convergent validity. The majority of the hypothesized highest correlations (indicative of convergent validity) and lowest correlations (indicative of divergent validity) were confirmed, although some of the hypothesized lowest correlations (e.g. for the 'Social functioning' app scale and 'Anxiety (GAD-7)" PROM) were higher than anticipated. Correlations between app scales (aggregated over periods of 1-4 weeks) and validated PROMs were, in some cases, high (r s up to -0.70). However, it is important to note that no collinearity was present, suggesting that the app is not a redundant measure. Further, EMA offers advantages over retrospective questionnaires: it captures variation in the outcomes over time, and it allows assessment of the direct impact of events (here, episodes of hypoglycemia) on the outcomes. All app scales correlated highly with several PROMs, which was expected as previous studies have shown associations between symptoms of anxiety and depression [44], sleep and mood [45], and depressive symptoms and cognitive functioning [46]. However, some expected correlations were not confirmed in this dataset. A moderate-to-strong correlation was expected, but not confirmed, between the 'Social functioning' question in the app and the single questions on the DIDP and PAID scales, which refer to 'relationships with others' and 'uncomfortable social situations', respectively. However, unlike the PAID and DIDP questions, the 'Social functioning' question has no attribution to diabetes/hypoglycemia and is focused on a single day, which may explain the low correlations. Instead, the 'Social functioning' question correlated highly with the general anxiety questionnaire. Previous qualitative research has shown that anxieties about unpredictable hypoglycemic episodes limit social activities [47], which supports the strong link between social functioning and anxiety seen here. Further work is required to establish the convergent validity of this question. Future research could also explore the meaning of this question from the perspective of the person with diabetes, using cognitive debriefing.
Evidence for convergent validity of work and productivity questions was mixed. High correlations on 'number of hours worked' and moderate correlations on 'productivity' questions suggest minimal recall bias on these questions when asked retrospectively for the previous seven days. Hours missed from work and 'activities other than work' showed very low correlations between daily and weekly measures, suggesting high recall bias in the PROM or that the questions in the app and the PROM were capturing different information.
A strength of this study is its innovative character, including use of advanced statistical methods suitable to explore the psychometric properties of an app for ecological momentary assessments. Factor analyses are often conducted on cross-sectional data, but when data are clustered with repeated measures per participant, use of standard techniques would violate a general assumption of independency between observations [33,48]. MCFA specifically enables a between-and within-person model to run simultaneously and make it possible for the factor structure to vary across these levels [33]. Another strength of this study was the use of several validated PROMs to examine convergent and divergent validity, and the use of approximately matched time periods for correlations between short-form measures (app scales) and longform measures (PROMs). Furthermore, psychometric properties were able to be confirmed in the first 100 participants of Hypo-METRICS: a sample including people with T1DM and T2DM, with a balanced gender distribution, and varied methods of glucose monitoring and levels of awareness of hypoglycemia. This was an optimal sample, as it was a balance of being large enough to conduct the planned analyses, while having data collection completed early enough to conduct essential analyses determining if the app was 'fit for purpose' prior to analyses of central study objectives. The data were collected in "real" everyday life settings, thereby improving ecological validity with reduced recall burden.
A limitation of this study is that due to the substantial requirements of participants in the Hypo-METRICS study, the sample may reflect a highly motivated and relatively "high functioning" group [49]. Thus, the acceptability of the app needs to be explored in other samples, and/or by use of qualitative methods. Another potential limitation is that, despite participants receiving notifications for each check-in at certain times, there were wide time-intervals (six hours) in which participants could submit the check-ins. Allowing participants to complete check-ins at the most convenient time likely increased engagement with the app but may have biased responses towards more positive daily functioning. Future studies could explore how shorter time-intervals would impact on both completion rates and daily functioning scores.
This study investigated the psychometric properties of the majority of the Hypo-METRICS app questions in the three daily check-ins. However, there are some hypoglycemia-specific questions in the app that were not explored here, as these are not asked at each check-in but only if a hypoglycemic episode was reported (i.e., much higher percentage of missing data must be anticipated for these). Additional work is needed to determine the acceptability and psychometric properties of these questions. Similarly, this study should be replicated in independent samples with diverse characteristics (e.g., ethnic, socio-economic, and health-related). Further research is also needed to fully understand respondents' completion patterns, including potential predictors of completion. Qualitative research would enable a subjective evaluation of the app completion, including the perceived value and/or burden of using the app across many weeks and ways to improve user experience. Qualitative research would further allow for additional investigation of the content validity (relevance, comprehensiveness and comprehensibility) of the app questions. Future versions of the app could be automated and include conversational agents that, by combining CGM data with daily functioning data from the app, could deliver daily guidance on how to optimize treatment plans and/or improve quality of life.
Overall, these findings show that the Hypo-METRICS app is an acceptable, valid, and reliable tool for research to advance knowledge of how hypoglycemia impacts on daily functioning. In addition to its potential in research, the app may have utility in clinical practice to enhance personalized treatment and care for people with diabetes.
Supporting information S1